智能论文笔记

Averaging Weights Leads to Wider Optima and Better Generalization

Pavel Izmailov , Dmitrii Podoprikhin , Timur Garipov , Dmitry Vetrov , Andrew Gordon Wilson

分类：

2018-03-14

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much flatter solutions than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and Shake-Shake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.

translated by 谷歌翻译

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Timur Garipov , Pavel Izmailov , Dmitrii Podoprikhin , Dmitry Vetrov , Andrew Gordon Wilson

分类：

2018-02-27

The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and ImageNet. * Equal contribution. 1 Suppose we have three weight vectors w1, w2, w3. We set u = (w2 − w1), v = (w3 − w1) − w3 − w1, w2 − w1 / w2 − w1 2 • (w2 − w1). Then the normalized vectors û = u/ u , v = v/ v form an orthonormal basis in the plane containing w1, w2, w3. To visualize the loss in this plane, we define a Cartesian grid in the basis û, v and evaluate the networks corresponding to each of the points in the grid. A point P with coordinates (x, y) in the plane would then be given by P = w1 + x • û + y • v.

translated by 谷歌翻译

The Platform for non-metallic pipes defects recognition. Design and Implementation

Fabio Cacciatori , Sergei Nikolaev , Dmitrii Grigorev

分类：机器学习

2022-12-09

This paper describes a prototype software and hardware platform to provide support to field operators during the inspection of surface defects of non-metallic pipes. Inspection is carried out by video filming defects created on the same surface in real-time using a "smart" helmet device and other mobile devices. The work focuses on the detection and recognition of the defects which appears as colored iridescence of reflected light caused by the diffraction effect arising from the presence of internal stresses in the inspected material. The platform allows you to carry out preliminary analysis directly on the device in offline mode, and, if a connection to the network is established, the received data is transmitted to the server for post-processing to extract information about possible defects that were not detected at the previous stage. The paper presents a description of the stages of design, formal description, and implementation details of the platform. It also provides descriptions of the models used to recognize defects and examples of the result of the work.

translated by 谷歌翻译

Learning Spatio-Temporal Model of Disease Progression with NeuralODEs from Longitudinal Volumetric Data

Dmitrii Lachinov , Arunava Chakravarty , Christoph Grechenig , Ursula Schmidt-Erfurth , Hrvoje Bogunovic

分类：计算机视觉 | 机器学习

2022-11-08

Robust forecasting of the future anatomical changes inflicted by an ongoing disease is an extremely challenging task that is out of grasp even for experienced healthcare professionals. Such a capability, however, is of great importance since it can improve patient management by providing information on the speed of disease progression already at the admission stage, or it can enrich the clinical trials with fast progressors and avoid the need for control arms by the means of digital twins. In this work, we develop a deep learning method that models the evolution of age-related disease by processing a single medical scan and providing a segmentation of the target anatomy at a requested future point in time. Our method represents a time-invariant physical process and solves a large-scale problem of modeling temporal pixel-level changes utilizing NeuralODEs. In addition, we demonstrate the approaches to incorporate the prior domain-specific constraints into our method and define temporal Dice loss for learning temporal objectives. To evaluate the applicability of our approach across different age-related diseases and imaging modalities, we developed and tested the proposed method on the datasets with 967 retinal OCT volumes of 100 patients with Geographic Atrophy, and 2823 brain MRI volumes of 633 patients with Alzheimer's Disease. For Geographic Atrophy, the proposed method outperformed the related baseline models in the atrophy growth prediction. For Alzheimer's Disease, the proposed method demonstrated remarkable performance in predicting the brain ventricle changes induced by the disease, achieving the state-of-the-art result on TADPOLE challenge.

translated by 谷歌翻译

Defining and Characterizing Reward Hacking

Joar Skalse , Nikolaus H. R. Howe , Dmitrii Krasheninnikov , David Krueger

分类：机器学习 | (统计)机器学习

2022-09-27

我们提供了奖励黑客的第一个正式定义，即优化不完美的代理奖励功能的现象，$ \ Mathcal {\ tilde {r}} $，根据真实的奖励功能，$ \ MATHCAL {R} $导致性能差。。我们说，如果增加预期的代理回报率永远无法减少预期的真实回报，则代理是不可接受的。直觉上，可以通过从奖励功能（使其“较窄”）中留出一些术语或忽略大致等效的结果之间的细粒度区分来创建一个不可接受的代理，但是我们表明情况通常不是这样。一个关键的见解是，奖励的线性性（在州行动访问计数中）使得无法实现的状况非常强烈。特别是，对于所有随机策略的集合，只有在其中一个是恒定的，只有两个奖励函数才能是不可接受的。因此，我们将注意力转移到确定性的政策和有限的随机政策集中，在这些策略中，始终存在非平凡的不可动摇的对，并为简化的存在建立必要和充分的条件，这是一个重要的不被限制的特殊情况。我们的结果揭示了使用奖励函数指定狭窄任务和对齐人类价值的AI系统之间的紧张关系。

translated by 谷歌翻译

On Developing Facial Stress Analysis and Expression Recognition Platform

Fabio Cacciatori , Sergei Nikolaev , Dmitrii Grigorev

分类：计算机视觉

2022-09-16

这项工作代表了沉浸式数字学习平台的系统面部表达识别和面部压力分析算法的实验和开发过程。该系统从用户网络摄像头检索，并使用人工神经网络（ANN）算法对其进行评估。 ANN输出信号可用于评分和改进学习过程。将ANN适应新系统可能需要大量的实施工作或重复ANN培训。还存在与运行ANN所需的最小硬件有关的局限性。为了使这些限制超过这些约束，提出了一些可能的面部表达识别和面部压力分析算法的实现。新解决方案的实施使得提高识别面部表情的准确性并提高其响应速度成为可能。实验结果表明，与社交设备相比，使用开发的算法可以以更高的速度检测心率。

translated by 谷歌翻译

Learning to correct spectral methods for simulating turbulent flows

Gideon Dresdner , Dmitrii Kochkov , Peter Norgaard , Leonardo Zepeda-Núñez , Jamie A. Smith , Michael P. Brenner , Stephan Hoyer

分类：机器学习

2022-07-01

尽管在整个科学和工程中都无处不在，但只有少数部分微分方程（PDE）具有分析或封闭形式的解决方案。这激发了有关PDE的数值模拟的大量经典工作，最近，对数据驱动技术的研究旋转了机器学习（ML）。最近的一项工作表明，与机器学习的经典数值技术的混合体可以对任何一种方法提供重大改进。在这项工作中，我们表明，在纳入基于物理学的先验时，数值方案的选择至关重要。我们以基于傅立叶的光谱方法为基础，这些光谱方法比其他数值方案要高得多，以模拟使用平滑且周期性解决方案的PDE。具体而言，我们为流体动力学的三个模型PDE开发了ML增强的光谱求解器，从而提高了标准光谱求解器在相同分辨率下的准确性。我们还展示了一些关键设计原则，用于将机器学习和用于解决PDE的数值方法结合使用。

translated by 谷歌翻译

SD-LayerNet: Semi-supervised retinal layer segmentation in OCT using disentangled representation with anatomical priors

Botond Fazekas , Guilherme Aresta , Dmitrii Lachinov , Sophie Riedl , Julia Mai , Ursula Schmidt-Erfurth , Hrvoje Bogunovic

分类：计算机视觉

2022-07-01

光学相干断层扫描（OCT）是一种非侵入性的3D模态，广泛用于视网膜的眼科。在OCT上实现自动化的解剖学视网膜层分割对于检测和监测不同视网膜疾病（如年龄相关的黄斑病（AMD）或糖尿病性视网膜病）很重要。但是，大多数最先进的层分割方法基于纯监督的深度学习，需要大量的像素级注释数据，这些数据昂贵且难以获得。考虑到这一点，我们将半监督的范式介绍到视网膜层分割任务中，该任务利用大规模未标记数据集中存在的信息以及解剖学先验。特别是，一种新型的完全可区分的方法用于将表面位置回归转换为像素结构化分割，从而使以耦合方式同时使用1D表面和2D层表示来训练模型。特别是，这些2D分割被用作解剖因素，与学习的样式因子一起组成了用于重建输入图像的分离表示。同时，我们建议一组解剖学先验，以改善有限的标记数据时，可以改善网络训练。我们在使用中间和湿amd的现实世界中的扫描数据集上证明了我们的方法在使用我们的完整训练集时优于最先进带有标记数据的一部分。

translated by 谷歌翻译

Learning to generalize Dispatching rules on the Job Shop Scheduling

Zangir Iklassov , Dmitrii Medvedev , Ruben Solozabal , Martin Takac

分类：机器学习 | 人工智能

2022-06-09

本文介绍了一种增强学习方法，以更好地概括有关工作店调度问题（JSP）的启发式调度规则。 JSP上的当前模型并不关注概括，尽管正如我们在这项工作中所显示的那样，这是对问题进行更好的启发式方法的关键。改善概括的一种众所周知的技术是使用课程学习（CL）学习日益复杂的实例。但是，正如文献中许多作品所表明的那样，在不同问题大小之间传递学习技能时，这种技术可能会遭受灾难性的遗忘。为了解决这个问题，我们引入了一种新颖的对抗性课程学习（ACL）策略，该策略在学习过程中动态调整了难度级别以重新审视最坏情况的实例。这项工作还提出了一个深度学习模型来解决JSP，这是e var的W.R.T.作业定义和尺寸不可能。对Taillard和Demirkol的实例进行了实验，表明所提出的方法显着改善了JSP上的最新模型。它的平均最佳差距从Taillard的实例中的平均最佳差距从19.35 \％降低到10.46 \％，而Demirkol的实例中的平均最佳差距从38.43 \％降低到18.85％。我们的实施可在线提供。

translated by 谷歌翻译

AI for Porosity and Permeability Prediction from Geologic Core X-Ray Micro-Tomography

Zangir Iklassov , Dmitrii Medvedev , Otabek Nazarov , Shakhboz Razzokov

分类：机器学习 | 人工智能 | 计算机视觉

2022-05-26

Geologic cores are rock samples that are extracted from deep under the ground during the well drilling process. They are used for petroleum reservoirs' performance characterization. Traditionally, physical studies of cores are carried out by the means of manual time-consuming experiments. With the development of deep learning, scientists actively started working on developing machine-learning-based approaches to identify physical properties without any manual experiments. Several previous works used machine learning to determine the porosity and permeability of the rocks, but either method was inaccurate or computationally expensive. We are proposing to use self-supervised pretraining of the very small CNN-transformer-based model to predict the physical properties of the rocks with high accuracy in a time-efficient manner. We show that this technique prevents overfitting even for extremely small datasets. Github: https://github.com/Shahbozjon/porosity-and-permeability-prediction

translated by 谷歌翻译